Systems and methods for automated end-to-end eye screening, monitoring and diagnosis

ABSTRACT

System and method for fully automated end-to-end eye screening with automated medical acquisition and analysis. The system includes an eye imaging device, a mechanism that moves the imaging device, a computing platform that guides the movement mechanism, a user interface, and an electronic display device and/or printer to provide the screening, monitoring, and/or diagnosis report.

PRIORITY INFORMATION

This application is a continuation of U.S. patent application Ser. No.16/234,301, filed Dec. 27, 2018, which claims the benefit of priorityunder 35 U.S.C. § 119(e) to U.S. Provisional Application No. 62/610,802,filed Dec. 27, 2017, the disclosures of both of which are incorporatedherein by reference in their entireties.

TECHNICAL FIELD

The subject matter described herein relates generally to fully automatedend-to-end screening, monitoring and diagnosis of systemic and/orretinal diseases and disorders with automated medical acquisition andanalysis. More particularly, it relates to the use ofrobotic/mechatronic systems/methods for automated image acquisitionwhere the imaging device is moved to image the required anatomicalstructures and image analysis systems/methods for automated generationof screening, monitoring, and diagnosis outcomes.

BACKGROUND

Retinal fundus photography (often referred to as fundus photography andincluding color fundus photography, scanning laser ophthalmoscopy basedultra-widefield photography, and optical coherence tomographymodalities) is frequently used as a screening, monitoring, or diagnostictool for multiple eye diseases, such as diabetic retinopathy, glaucoma,and age-related macular degeneration. Screening, monitoring, anddiagnosis using fundus photographs allows for patients without access toeye-care specialists to be screened for multiple diseases. Fundusphotography and any subsequent analysis can only be effective with theconsistent capture of good quality photographs which in many casesrequires a highly trained and experienced technician.

A set of fundus photographs belonging to a particular patient isconsidered to be of good/gradable quality when the fundus photographs:

-   capture the regions of the retina that are required for the    screening/monitoring/diagnosis of a given disease-   have sufficient illumination and clarity to allow for the    examination of the anatomical and pathological features of interest;    and-   are free of artifacts including but not limited to, eyelashes, dust,    and smudges on the lens.

However, the number of technicians trained for fundus photography isorders of magnitude smaller than that needed to screen the large andgrowing population that require screening (e.g. 415 million diabeticpatients worldwide need annual diabetic retinopathy screening. See,INTERNATIONAL DIABETES FEDERATION: IDF Diabetes Atlas, 7th edn.Brussels, Belgium: International Diabetes Federation, 2015 [INTE15]).This gap in the number of trained technicians can only be met by asystem that fully automates the capture and analysis of fundusphotographs for screening, thus eliminates the need for a trainedtechnician-in-the-loop. Such systems can potentially make eye screeningmore efficient, cost-effective, reproducible, and accessible.

A need therefore exists to develop a device and method that provide easeof use, accuracy, speed, portability and affordability, and thatovercome these and other limitations of the prior art.

SUMMARY

This summary and the following detailed description should beinterpreted as complementary parts of an integrated disclosure, whichparts may include redundant subject matter and/or supplemental subjectmatter. An omission in either section does not indicate priority orrelative importance of any element described in the integratedapplication. Differences between the sections may include supplementaldisclosures of alternative embodiments, additional details, oralternative descriptions of identical embodiments using differentterminology, as should be apparent from the respective disclosures.

System and method for fully automated end-to-end screening withautomated medical acquisition and analysis are provided in the presentdisclosure. Generally, the present disclosure includes system and methodthat relate to automated capture of gradable/screenable fundusphotographs or video sequences using eye imaging devices attached to arobotic moving platform; automated analysis of the captured photographsor videos to screen for, monitor, and diagnose particular diseases; andproviding of a screening, monitoring, and diagnosis outcome. The presentdisclosure may screen a patient for multiple diseases with easy-to-useuser interfaces, including but not limited to, the push of abutton/switch, touch-activated electronic display (touch-screen)interface inputs, pressure sensors/activators in head rest, and/orvoice-activated controls. In some embodiments, the eye imaging devicemay be an existing off-the-shelf and portable camera unit. In someembodiments, the robotic moving platform may be a level surface or anarm.

In some aspects, the system and method meet or exceed the followingrequirements:

Ease of use: The system and method are easy to operate, for example,using an intuitive user interface. This reduces the training time andeducational background necessary for the operators/users.

Accuracy: The system and method may have a 90% or higher rate of successin acquiring gradable photographs in a fully-automated mode. In someembodiments, the system and method may allow for a semi-automatic orfully-manual mode for override by the operator/user. One example is whencertain gradable photographs are desired.

Speed: The system and method may capture photographs of both eyes of apatient in a short amount of time, for example, under 2 minutes. Thishelps improve patient experience and comfort.

Portability and affordability: The system and method may be compact,portable, and relatively inexpensive to enable large scale deployment ina variety of clinic conditions.

In some embodiments of such a system, a patient sits on a chair in frontof a robotic system to which a portable imaging device is attached. Whenthe operator, possibly untrained or minimally trained, activates thescreening procedure using the user interfaces, the robotic movingplatform moves the imaging device in a manner to obtain gradablephotographs or video sequences of the eye and/or some of its anatomicalstructures. The system then analyzes these photographs or videosequences to screen for multiple diseases and presents a screening anddiagnosis report/outcome, for example to the patient.

In some embodiments, the system is self-operated. Here, the untrainedoperator may be the patient being screened.

In some embodiments, the system captures the photographs using ahand-held imaging device by a trained photographer instead of therobotic system. The hand-held imaging device may be coupled with anelectronic display to aid the operator in correctly aligning the imagingdevice.

In some embodiments, the fully automated system may have the followingmajor components: an eye imaging device, a moving platform capable ofmoving the imaging device in 3-dimensional space, a computing devicethat enables the automated analysis, a user interface, and an optionalelectronic display device and/or printer to provide the screening,monitoring, and/or diagnosis report, for example to the patient.

In addition to, in alternative of having the fully automatic acquisitionand analysis of photographs, certain embodiments of the device may alsoallow for a semi-automatic or fully manual override for photograph orvideo sequence acquisition by the operator/user, for example, whencertain gradable photographs cannot be obtained in a fully automatedmode. In the semi-automatic mode, the operator may perform simple taskssuch as centering the imaging device's view on the target eye. In thefully manual mode, a set of manual controls is provided, where thetechnician can steer the imaging device to center on the eye and move tothe proper working distance to capture a gradable photograph.

The system may also include system safety precautions including but notlimited to, a hard mechanical limit on the moving platform's range ofmotion and a programmed limit on the moving platform's range of motion.These safety precautions prevent the system from injuring the patient orthe operator, and/or harming the device itself.

Other systems, devices, methods, features and advantages of the subjectmatter described herein will be or will become apparent to one withskill in the art upon examination of the following figures and detaileddescription. It is intended that all such additional systems, devices,methods, features and advantages be included within this description, bewithin the scope of the subject matter described herein, and beprotected by the accompanying claims. In no way should the features ofthe example embodiments be construed as limiting the appended claims,absent express recitation of those features in the claims.

BRIEF DESCRIPTION OF THE FIGURES

The details of the subject matter set forth herein, both as to itsstructure and operation, may be apparent by study of the accompanyingfigures, in which like reference numerals refer to like parts. Thecomponents in the figures are not necessarily to scale, emphasis insteadbeing placed upon illustrating the principles of the subject matter.Moreover, all illustrations are intended to convey concepts, whererelative sizes, shapes and other detailed attributes may be illustratedschematically rather than literally or precisely.

FIG. 1 illustrates an exemplary end-to-end eye screening with automatedmedical acquisition and analysis, in accordance with an embodiment ofthe present disclosure.

FIG. 2 illustrates an exemplary end-to-end eye screening with automatedmedical acquisition and analysis system where the imaging device isoperated by a technician, in accordance with an embodiment of thepresent disclosure.

FIG. 3 illustrates an exemplary self-operated end-to-end eye screeningwith automated medical acquisition and analysis system, in accordancewith an embodiment of the present disclosure.

FIG. 4 illustrates another exemplary end-to-end eye screening withautomated medical acquisition and analysis, in accordance with anembodiment of the present disclosure.

FIG. 5 illustrates an exemplary hand-held system for end-to-end eyescreening with automated medical acquisition and analysis, with ahand-held imaging device and a coupled electronic display in accordancewith an embodiment of the present disclosure.

FIG. 6 illustrates exemplary high-level diagram of components of anend-to-end eye screening with automated medical acquisition andanalysis, in accordance with an embodiment of the present disclosure.

FIGS. 7A and 7B illustrate exemplary process flow diagrams of anend-to-end eye screening with automated medical acquisition andanalysis, in accordance with an embodiment of the present disclosure.

FIG. 7C illustrates exemplary positions of the imaging device at variousphases of the image acquisition process, in accordance with anembodiment of the present disclosure.

FIG. 8 illustrates exemplary facial landmarking of a patient on a headrest, in accordance with an embodiment of the present disclosure.

FIG. 9 illustrates four exemplary views of the imaging device as itfixes on the target eye and progressively moves towards the correctworking distance to capture gradable images, in accordance with anembodiment of the present disclosure.

DETAILED DESCRIPTION

Before the present subject matter is described in detail, it is to beunderstood that this disclosure is not limited to the particularembodiments described, as such may, of course, vary. It is also to beunderstood that the terminology used herein is for the purpose ofdescribing particular embodiments only, and is not intended to belimiting, since the scope of the present disclosure will be limited onlyby the appended claims.

FIGS. 1-9 illustrate embodiments of systems and methods for fullyautomated end-to-end eye screening with automated medical acquisitionand analysis. Generally, the hardware components of the disclosure maycomprise an eye imaging device (which, in some embodiments, may be anoff-the-shelf camera), a mechanism that moves the imaging device, thecomputing platform that guides the movement mechanism, a user interface,and an electronic display device and/or printer to provide thescreening, monitoring, and/or diagnosis report, for example to thepatient. In some embodiments, the hardware components may be modified orimplemented in multiple forms. In some embodiments, the presentdisclosure includes four main processing/control software procedures ormodules (illustrated in FIG. 6): a user interface module to interactwith an operator; retinal image acquisition module to control thehardware and capture one or more images (or photographs) or videosequences; the analysis module to process the images/videos to screenfor, monitor, or diagnose diseases; and a module to store, report, andperform necessary tasks given the outcome. A video sequence can beconsidered as a series of multiple images and therefore, henceforth inthis disclosure, the use of image and images also include video andvideo sequences. The feedback loop 610 illustrated as a dashed arrow inFIG. 6 can be optionally added to allow the system to recapture imagesor capture additional images based on whether the current images orportions of current images provide sufficient data/evidence for theautomated analysis to generate a screening/diagnosis outcome. In someembodiments, these modules can be presented in many forms, some may becoupled, omitted, or implemented in forms that are not entirelysoftware. For example, in the hand-held form (as illustrated in FIG. 5),the system takes advantage of the technician's skills in positioning thehand-held imaging device and the automated system evaluates whether thecurrent image is of sufficient quality for automated analysis, and ifnot prompt the technician to retake the image. It should be noted that atechnician may also be referred to herein as an operator or user.

FIG. 2 illustrates an exemplary embodiment of the system of the presentdisclosure where the patient 210 sitting on a chair in front of arobotic system 200 to which an imaging device 104 is attached. When theoperator 220, possibly untrained or minimally trained, activates themethod of the present disclosure using one of the input methods, themoving platform 112 moves the imaging device 104 in a manner to obtaingradable photographs. These photographs are then analyzed to screen formultiple eye diseases and a screening report/outcome is presented to thepatient. In an aspect, the photographs are fundus photographs.

In another exemplary embodiment of this system, illustrated in FIG. 3,the system 300 may be self-operated. Here, the untrained operator may bethe patient 310 being screened.

In FIG. 5, in another exemplary embodiment of the system, thephotographs are captured using a hand-held imaging device 500 by atrained photographer 540 instead of the robotic system.

FIG. 6 illustrates high-level exemplary processing/control components ofthe system of the disclosure.

Hardware/Mechanical Components

FIG. 1 illustrates an exemplary embodiment of the mechanical componentsof the fully automated robotic version of the system 100 of the presentdisclosure. The main part is a moving platform 112 on which is mountedthe imaging device 104. Both the moving platform 112 and the imagingdevice 104 are connected to the computing platform or computer 110. Inthis embodiment the imaging device 104 may be a portable fundus camera.

The moving platform 112 may be coupled to a controller 106 that isconnected to the computer 110 as well as three sets of motors, eachresponsible for motion of the imaging device 104 along one of thespatial x-, y-, and z-axes. The connections allow the controller torelay motion, inducing commands to the motors from theprocessing/control system. The controller may filter these commands sothat the hardware only executes commands either in desired safety limits(for example, when an axis is about to hit a hardware limit), or inaccordance to the current dynamics of the system (for example, if thesystem wants to reverse the motor's direction while it is currentlymoving at a high speed, the next command should be one that firstreduces the velocity to a smooth stop, before proceeding to move inreverse).

To aid the acquisition process and enhance system safety, the movingplatform 112 may also be equipped with other sensors to guide itsmotion, including but not limited to additional cameras, proximitysensors (e.g. infrared or laser), or attitude proprioceptive sensor(e.g. accelerometer, gyroscope). These sensors may be mounted on theplatform, on the imaging device, separately by itself, or in otherplaces, combinations, or configurations. A head rest 102, which maycomprise of parts intended for the patient to rest his/her forehead,chin, and/or cheekbones, may be included in the system to hold thepatient's head steady for the duration of the image acquisition(photography) process. Furthermore, there may be other accessories thatcan be added, including but not limited to, a fixation target to directthe patient's gaze.

The system may include a set of manual controls, for example, joystick108, to allow the operator to manually override the operation, ifneeded.

In some embodiments, the system may include a touch-activated electronicdisplay, or voice commands to assist the movement of the imaging device.

FIG. 4 illustrates another exemplary embodiment of the mechanicalcomponents of the fully automated robotic version of the system 400 ofthe present disclosure. System 400 may include moving platform 412 onwhich is mounted the imaging device 404. Both the moving platform 412and the imaging device 404 are connected to the computing platform orcomputer 416. An electronic display 410 is connected to the imagingdevice 404 and the computing platform 416. In this embodiment theimaging device 404 may be a portable fundus camera.

The moving platform 412 may be coupled to a controller (not shown) thatis connected to the computer 416 as well as three sets of motors, eachresponsible for motion of the imaging device 104 along one of thespatial x-, y-, and z-axes. The connections allow the controller torelay motion, inducing commands to the motors from theprocessing/control system. The controller may filter these commands sothat the hardware only executes commands either in desired safety limits(for example, when an axis is about to hit a hardware limit), or inaccordance to the current dynamics of the system (for example, if thesystem wants to reverse the motor's direction while it is currentlymoving at a high speed, the next command should be one that firstreduces the velocity to a smooth stop, before proceeding to move inreverse).

To aid the acquisition process and enhance system safety, the movingplatform 412 may also be equipped with other sensors to guide itsmotion, including but not limited to additional cameras, proximitysensors (e.g. infrared or laser), or attitude proprioceptive sensor(e.g. accelerometer, gyroscope). These sensors may be mounted on theplatform, on the imaging device, separately by itself, or in otherplaces, combinations, or configurations. A head rest 402, which maycomprise of parts intended for the patient to rest his/her forehead,chin, and/or cheekbones, may be included in the system to hold thepatient's head steady for the duration of the image acquisition(photography) process. Furthermore, there may be other accessories thatcan be added, including but not limited to, a fixation target to directthe patient's gaze.

The system may also include a set of manual controls, for example,joystick 408, to allow the operator to manually override the operation,if needed.

Processing/Control Components

An exemplary processing/control system 600 of the present disclosure(illustrated in FIG.

6) may comprise four sub-systems. A user interface sub-system 602enables the operator to start the process and/or calibrate the system.The image acquisition sub-system 604 enables the control of the movingplatform and the imaging device to capture the required photographs. Theimage analysis sub-system 606 analyzes the photographs/images capturedto automatically provide one or more screening/monitoring/diagnosticoutcomes. The feedback loop illustrated as a dashed arrow 610 in FIG. 6can be performed to allow the system to recapture images or captureadditional images based on whether the current images or portions ofcurrent images provide sufficient evidence for the automated analysis togenerate a screening/monitoring/diagnostic report at 608. Methods forperforming image analysis are similar to the methods described in“Systems and Methods for Processing Retinal Images for Screening ofDiseases or Abnormalities” (Solanki, Kaushal Mohanlal; Bhat Krupakar,Sandeep; Amai Ramachandra, Chaithanya; and Bhaskaranand, Malavika)[SBAB15]. The screening/diagnosis report sub-system 608 may generate areport with the outcomes generated by the image analysis sub-system 606to an electronic display device and/or printer and can be initiated bythe user interface.

FIGS. 7A and 7B provide further details of the processing/controlsub-systems. FIG. 7A illustrates a high-level diagram 700 of an imageacquisition sub-system. An exemplary image acquisition sub-system 720 isillustrated with more details in FIG. 7B. In some aspects, it isresponsible for moving the imaging device to capture high qualityimages. The system accommodates a broad range of intrinsic and extrinsicconfounding conditions, including but not limited to the patient and eyemovement during the process; varying appearances of faces, eyes, pupil(particularly pupil diameter), and retina (if retinal images are to becaptured); and various environmental lighting or room conditions.Additionally, in order to maintain the patient's comfort, the procedureduration may be as short as necessary. Furthermore, mechanisms that canlead to patient discomfort, including but not limited to, extendedduration of a bright light pointed towards the eye, may be minimized.This image acquisition sub-system relies on inputs from the imagingdevice and, in some embodiments, from other sensors including but notlimited to additional cameras, proximity sensors (e.g. infrared orlaser), or attitude proprioceptive sensor (e.g. accelerometer,gyroscope).

As shown in FIGS. 7A and 7B, the image acquisition sub-system performs anumber of tasks: move the imaging device to a location specified by thedesired eye (at 704, 724, 726), place the imaging device at a workingdistance away from said eye (at 706, 728, 730), perform image qualitycheck (at 708, 734), and in some embodiments, for example where theimaging device is a retinal camera, perform retinal geography check (at710, 736). During the procedure, the images may be continuously inputinto the sub-system at a constant rate and are processed, regardless ofthe current task.

In some embodiments, at the start of the process (prior to imageacquisition), since the imaging device can be at a random location, asshown by 790 in FIG. 7C, the system must perform the first task to movethe imaging device in the X, Y and Z directions to roughly direct it atthe target eye, as shown by 792 in FIG. 7C. During this procedure, theimaging device may be maintained at a minimum distance from the targeteye to allow for locating the eye by imaging a sufficient portion of theface (as shown for example, 902 in FIG. 9). The techniques to solve thistask are described in the CENTERING THE EYE section below. This task maybe repeated at any time if the patient changes body or head position,saccades, or blinks, as such, the system needs to re-center the imagingdevice such that the target eye is in its field of view.

The system may also re-center the eye in the imaging device's field ofview if any of the sub-modules following it repeatedly fails for apre-specified amount of time, as dep.

The second task (at 706, 728, 730) is to move the imaging device towardthe eye while maintaining the pupil in the center of the imagingdevice's field of view. At the end of this phase, the imaging device isat a working distance away from the eye, and in certain embodimentswhere the imaging device is a fundus camera, portions of the retina arein the field of view (as shown, for example, 904 in FIG. 9). Thetechniques to solve this task are described in the MOVING TO WORKINGDISTANCE section.

For the hand-held embodiment of the system, it may be assumed that whenthe system is not being moved by the technician (detected using attitudeproprioceptive sensors), it is positioned close to the correct workingdistance.

The image acquisition sub-system then performs an image quality check(at 708, 734) using techniques described in the IMAGE QUALITY ASSESSMENTsection. To increase the robustness of the implementation, oneembodiment of the system may capture a number of images while close tothe working distance, evaluate the quality of all such images, andsubsequently select one or more images with highest quality for furtheranalysis.

In certain embodiments where the imaging device is a fundus camera, oncean image passes the image quality check, a specific embodiment of thesystem then checks the retinal geography covered by the image (at 710,736). If the image analysis sub-system determines that the images (orportions thereof) captured thus far do not provide sufficient evidenceto generate screening/monitoring/diagnosis outcomes, the image captureprocess may be repeated to photograph additional regions of the eye asdetermined based on the disease(s) being screened/monitored/diagnosed.In certain embodiments of the system, the retinal geography check isperformed by identifying anatomical structures of interest (includingbut not limited to, the macula, optic nerve head, vessels) usingtechniques described in [SBAB15].

Centering the Eye

The task of centering the imaging device on the desired eye can besolved by making use of proven image-based face detection and faciallandmark detection techniques as described in: “Dlib-ml: A MachineLearning Toolkit,” King, Davis E., in: Journal of Machine LearningResearch Bd. 10 (2009), Nr. Jul, S. 1755-1758 [King09]; “DeformableModel Fitting by Regularized Landmark Mean-Shift,” Saragih, Jason M.;Lucey, Simon; Cohn, Jeffrey F., in: International Journal of ComputerVision Bd. 91 (2011), Nr. 2, S. 200-215 [SaLC11]; “Deep ConvolutionalNetwork Cascade for Facial Point Detection,” Sun, Yi; Wang, Xiaogang;Tang, Xiaoou, in: Proceedings of the 2013 IEEE Conference on ComputerVision and Pattern Recognition, CVPR '13. Washington, D.C., USA: IEEEComputer Society, 2013—ISBN 978-0-7695-4989-7, S. 3476-3483 [SuWT13];“Learning Deep Representation for Face Alignment with AuxiliaryAttributes,” Zhang, Zhanpeng; Luo, Ping; Loy, Chen Change; Tang, Xiaoou,in: IEEE Transactions on Pattern Analysis and Machine Intelligence Bd.38 (2016), Nr. 5, S. 918-930.—arXiv: 1408.3967 [ZLLT16]; and“Coarse-to-Fine Auto-Encoder Networks (CFAN) for Real-Time FaceAlignment,” Zhang, Jie; Shan, Shiguang; Kan, Meina; Chen, Xilin, in:Computer Vision—ECCV 2014, Lecture Notes in Computer Science: Springer,Cham, 2014 ISBN 978-3-319-10604-5, S. 1-16 [ZSKC14]. Referring to FIG.8, an example of face landmarking is illustrated. In certain embodimentsof the system, the facial detection problem in this system may beconstrained in multiple ways: the patient is in front of the imagingdevice; the patient is in a known frontal pose with a neutral facialexpression; there is minimal occlusion of the face; (the head rest mayocclude edges of the face). The search region for the system may befurther constrained by the head rest 102 as shown in FIG. 8.Additionally, the system can reliably make use of three major faciallandmarks: corner of the eyes, pupils, and eyebrows, due to theassumption that only one face, which could be partially occluded, willbe present in the images captured at any point during the procedure.These constraints constitute a novel problem not addressed by existingimage-based face detection and facial landmark detection techniques.

There exist many proven image-based face detection and facial landmarkdetection techniques. Some approaches use keypoint-based genericdescriptors (e.g. SIFT, HoG, SURF) (“Supervised Descent Method and ItsApplications to Face Alignment,” Xiong, Xuehan; De la Torre, Fernando,in: Proceedings of the 2013 IEEE Conference on Computer Vision andPattern Recognition, CVPR '13. Washington, D.C., USA: IEEE ComputerSociety, 2013—ISBN 978-0-7695-4989-7, S. 532-539 [XiDe13]; and “FaceAlignment by Coarse-To-Fine Shape Searching,” Zhu, Shizhan; Li, Cheng;Loy, C. C.; Tang, X., in: 2015 IEEE Conference on Computer Vision andPattern Recognition (CVPR), 2015, S. 4998-5006 [ZLLT15]) to detectfacial features in systems trained for automated face alignment. Morerecent approaches have employed deep-learning based features [See,SuWT13, ZLLT16]. Although these features can be used to fit a templateinto the face (“Face Detection, Pose Estimation, and LandmarkLocalization in the Wild,” Zhu, X.; Ramanan, D., in: 2012 IEEEConference on Computer Vision and Pattern Recognition, 2012, S.2879-2886 [ZhRa12]), a number of approaches [XiDe13, ZLLT15] applycascaded regression techniques, where landmark locations are iterativelyrefined, some using deep learning methods (“Deep Recurrent Regressionfor Facial Landmark Detection,” Lai, Hanjiang; Xiao, Shengtao; Pan, Yan;Cui, Zhen; Feng, Jiashi; Xu, Chunyan; Yin, Jian; Yan, Shuicheng (2015)[LXPC15]). Moreover, few systems (“Automatic Landmark Detection and 3DFace Data Extraction,” Boukamcha, Hamdi; Hallek, Mohamed; Smach, Fethi;Atri, Mohamed, in: Journal of Computational Science Bd. 21 (2017), S.340-348) [BHSA17]; and “3D Facial Landmark Detection under Large Yaw andExpression Variations,” Perakis, P.; Passalis, G.; Theoharis, T.;Kakadiaris, I. A., in: IEEE Transactions on Pattern Analysis and MachineIntelligence Bd. 35 (2013), Nr. 7, S. 1552-1564 [PPTK13]) use 3D depthimages to provide additional robustness in face landmarking.

In an embodiment, the system may use a multi-level approach to detectthe landmarks around the eye. The initial estimation is done usinglandmark-detection algorithms such as provided in [King09] along with adeep-learning based solution trained using facial images, cropped tojust show eyes. The finer estimation is done using a sliding-windowapproach with the aforementioned algorithms around the initial estimateof the eye location. After arriving at the final estimate, the imagingdevice is moved in 3-D space using the estimated location as a target,to center the eye in its field of view. This procedure is performedbefore approaching the working distance (as discussed in the MOVING TOWORKING DISTANCE section). In some embodiments, the system may utilizeadditional sensors to assist in determining the distance of the imagingdevice from the eye and thus the right working distance.

Moving to Working Distance

The images captured as the imaging device moves towards the eye (to getto the correct working distance) can be categorized into potentiallymultiple stages as illustrated in FIG. 9. In some embodiments, at eachof the multiple stages the imaging device captures a progressivelycloser visible field of view of the eye.

At the beginning, the imaging device is sufficiently distant from theface to view the face 902 in sharp focus. As the imaging device movestowards the patient's target eye, a very limited region around the eyeis visible in the field of view 904 and the eye is in focus. Furthercloser to the eye, a large portion of the field of view 906 is coveredby the iris of the eye with the pupil being centered. In certainembodiments where the imaging device is a fundus camera, when the cameramoves further in, portions of the retina become visible in the field ofview 908 and at the correct working distance, the retinal plane is infocus and retinal vessels are visible and in focus.

Throughout the process of moving towards the working distance, it isimportant that the pupil stays centered in the image even with theoccurrence of sudden eye movements or blinks. In some embodiments, thesystem may use cooperative landmark tracking and recognition of the eye.

Proven algorithms exist for visual tracking algorithms, like Mean-Shifttracking, as described in” Kernel-based Object Tracking,” Comaniciu, D.;Ramesh, V.; Meer, P., in: IEEE Transactions on Pattern Analysis andMachine Intelligence Bd. 25 (2003), Nr. 5, S. 564-577 [CoRM03] or KLT,as described in “Good features to track,” Shi, Jianbo; Tomasi, C., in:1994 Proceedings of IEEE Conference on Computer Vision and PatternRecognition, 1994, S. 593-600 [ShTo94]. However, these algorithms aredeveloped for problems where the moving object is at near-constant depthfrom the viewer, which is not always the case, where embodiments of thepresent disclosure can operate in. One approach to tracking involvescreating templates of the target object/region (pupil, eye, andimmediate facial surroundings) at various depths and recording thesmooth changes expected between images captured at different timeinstances. These templates can utilize many features including but notlimited to textures and key-points and its configurations (see “ObjectTracking Using SIFT Features and Mean Shift,” Zhou, Huiyu; Yuan, Yuan;Shi, Chunmei, in: Computer Vision and Image Understanding, Special Issueon Video Analysis. Bd. 113 (2009), Nr. 3, S. 345-352 [ZhYS09]),histogram of an area ([CoRM03]), or a combination of these.

“Tracking-Learning-Detection,” Kalal, Z.; Mikolajczyk, K.; Matas, J.,in: IEEE Transactions on Pattern Analysis and Machine Intelligence Bd.34 (2012), Nr. 7, S. 1409-1422 [KaMM12] looks at the problem a stepfurther in their Tracking-Learning-Detection (TLD) tracker that not onlydetects, tracks, and updates the target's appearance, but alsounderstands the distractors in the environment.

In recent years, deep neural network target trackers (as described in“Deep Track: Learning Discriminative Feature Representations Online forRobust Visual Tracking,” Li, Hanxi; Li, Yi; Porikli, Fatih, in: IEEETransactions on Image Processing Bd. 25 (2016), Nr. 4, S.1834-1848.—arXiv: 1503.00072 [LiLP16]; “Learning Multi-DomainConvolutional Neural Networks for Visual Tracking,” Nam, Hyeonseob; Han,Bohyung, in: arXiv: 1510.07945 [cs] (2015).—arXiv: 1510.07945 [NaHa15];“Learning a Deep Compact Image Representation for Visual Tracking,”Wang, Naiyan; Yeung, Dit-Yan, in: Burges, C. J. C.; Bottou, L.; Welling,M.; Ghahramani, Z.; Weinberger, K. Q. (Hrsg.): Advances in NeuralInformation Processing Systems 26: Curran Associates, Inc., 2013, S.809-817 [WaYe13]; and “Robust Visual Tracking via ConvolutionalNetworks,” Zhang, Kaihua; Liu, Qingshan; Wu, Yi; Yang, Ming-Hsuan, in:arXiv: 1501.04505 [cs] (2015).—arXiv: 1501.04505 [ZLWY15]) have becomethe state-of-the-art as they improved performances with respect to moreextreme object rotation, view-point, and lighting. Their commoncharacteristic is that they require online training to allow tracking ofobjects not included during offline training, which can be slow.However, in an embodiment of the present disclosure, due to theconstrained nature of the tracking problem, the training may be doneoff-line or significantly earlier than testing.

In addition to tracking the pupil while the imaging device is movingtowards the eye, the system also needs to determine when to stop. Thesystem can tolerate a small error margin from the optimal workingdistance if it captures multiple images when close to the optimalworking distance. In some embodiments, the image acquisition process mayhave to reset if a sufficient number of gradable images is notavailable. Therefore, in some embodiments, the entire problem ofarriving at the optimal working distance can be reformulated withmachine learning, for example as a reinforcement learning (RL) task.Given sensor inputs, a machine learning module, for example an RLmodule, can directly output motor commands to the moving platform thatwill move the imaging device exactly to the working distance. Herereinforcement learning simultaneously solves the pupiltracking/detection problem as well as the determination of the optimalworking distance. In an additional embodiment, deep reinforcementlearning (deep-RL), popularized by “Human-Level Control Through DeepReinforcement Learning,” Mnih, Volodymyr; Kavukcuoglu, Koray; Silver,David; Rusu, Andrei A.; Veness, Joel; Bellemare, Marc G.; Graves, Alex;Riedmiller, Martin; u. a., in: Nature Bd. 518 (2015), Nr. 7540, S.529-533 [MKSR15], allows for end-to-end learning by making use of therepresentational learning of deep learning and objective learning ofreinforcement learning.

In some embodiments, the system of the present disclosure employs acombination of multiple tracking techniques. The system may use the TLDtracker [KaMM12], using their standard features, but also incorporatingthe face landmarking algorithms of [King09]. In addition, system maysupplement the tracker with proximity sensors. The system may alsoinclude a deep-RL system that can make use of other sensory inputs (inaddition to the images), such as encoder values from the motors. Thisway the technique can correlate physical measurements and distances toground the system and limit the search area.

Image Quality Assessment

The problem of Image Quality Assessment (IQA) is to deem an image asgradable by checking for a number of parameters. In certain embodimentswhere the image is a retinal fundus image, the parameters to be checkedinclude but are not limited to, sufficient quality for visualization ofanatomical structures and lesions, proper focusing of the image, andcorrect illumination of the image to allow for a clear and focused viewof the retinal vessels.

One class of image quality assessment techniques use features includingimage characteristics information. These may include illumination andsharpness (as described in “Evaluation of Retinal Image Gradability byImage Features Classification,” Dias, João Miguel Pires; Oliveira,Carlos Manta; Cruz, Luís A. da Silva, in: Procedia Technology, 4thConference of ENTERprise Information Systems—aligning technology,organizations and people (CENTERIS 2012). Bd. 5 (2012), S. 865-875[DiOC12]); landmark structures like the retinal vessels and the opticnerve head within the image (as described in “Quality Assessment ofRetinal Fundus Images Using Elliptical Local Vessel Density,” Giancardo,Luca; Meriaudeau, Fabrice; Karnowski, Thomas P.; Chaum, Edward; Tobin,Kenneth, in: New Developments in Biomedical Engineering, Ed. DomenicoCampolo, INTECH. http://sciyo.com/books/show/title/new-developments-in-biomedical-engineering (2010)[GMKC10]; and “Automatic Fundus Image Field Detection and QualityAssessment,” Katuwal, G. J.; Kerekes, J.; Ramchandran, R.; Sisson, C.;Rao, N, in: Image Processing Workshop (WNYIPW), 2013 IEEE Western NewYork, 2013, S. 9-13 [KKRS13]); or a combination of characteristicmeasures and structure based information (as described in “AutomatedQuality Assessment of Retinal Fundus Photos,” Paulus, Jan; Meier, Jörg;Bock, Rüdiger; Hornegger, Joachim; Michelson, Georg, in: Internationaljournal of computer assisted radiology and surgery Bd. 5 (2010), Nr. 6,S. 557-564 [PMBH10]; and in [SBAB15]).

Another class of techniques combine such localized features across thewhole image ([DiOC12]). However, such a method would not allow for thedetection of localized image quality issues and/or artifacts.Structure-based approaches can use raw pixels to create higher-levelblobs, which can be then traced back to contours and ultimatelystructures within the image ([GMKC10]), thus creating a full structuralpicture of the image that is highly robust.

The newest class of techniques rely on deep learning methods (see,“Retinal Image Quality Classification Using Saliency Maps and CNNs,”Mahapatra, Dwarikanath; Roy, Pallab K.; Sedai, Suman; Garnavi, Rahil,in: Machine Learning in Medical Imaging, Lecture Notes in ComputerScience: Springer, Cham, 2016—ISBN 978-3-319-47156-3, S. 172-179[MRSG16]; and “Deep Learning for Automated Quality Assessment of ColorFundus Images in Diabetic Retinopathy Screening,” Saha, Sajib Kumar;Fernando, Basura; Cuadros, Jorge; Xiao, Di; Kanagasingam, Yogesan, in:arXiv:1703.02511 [cs] (2017).—arXiv: 1703.02511 [SFCX17]) where neuralnetworks are trained for image quality assessment on a labeled datasetwith images of varying qualities. This approach eliminates the need forhand-crafting features and methods of combining them, whilesimultaneously encoding notions of blob-level grouping andconnectedness.

In some embodiments, the system of the present disclosure is a deeplearning based system trained using a labeled dataset of fundus imageswith varying quality, expanded by data augmentation. In someembodiments, a deep learning based system is used to determine thequality of portions of fundus images.

Retinal Coverage Assessment

This sub-system is used, in some embodiments, where the images capturedare fundus/retinal images. To generate a screening/diagnosis report, thesystem needs images that visualize a wide region of the retina.Off-the-shelf imaging devices usually have a limited field of view andthus cannot capture all the regions required for generating the reportin a single image. This presents the need for multiple images of theretina. The system captures one or more images of different regions ofthe retina. To ensure that all required regions are present in theimages captured by the imaging device, the system will make use ofproven algorithms to detect retinal fields (such as, but not limited to,optical nerve head-centered view, and macula-centered view) in an imagethat correspond to regions of the retina.

In some embodiments, the system of the present disclosure employsmachine learning, for example deep-learning methods to detect theretinal field present in the image. The deep learning system is trainedusing a dataset of fundus images labeled with the retinal field theyrepresent, and this dataset is expanded by data augmentation. Inconjunction with the deep-learning methods used for image qualityassessment, these systems will aggregate gradable images in a way thatthe selected images, collectively, represent a view of all retinalfields necessary for providing a screening/diagnosis report.

While embodiments of the present invention have been shown anddescribed, various modifications may be made without departing from thespirit and scope of the present invention, and all such modificationsand equivalents are intended to be covered.

As used herein and in the appended claims, the singular forms “a”, “an”,and “the” include plural referents unless the context clearly dictatesotherwise.

In the following description and in the figures, like elements areidentified with like reference numerals. The use of “e.g.,” “etc.,” and“or” indicates non-exclusive alternatives without limitation, unlessotherwise noted. The use of “including” or “includes” means “including,but not limited to,” or “includes, but not limited to,” unless otherwisenoted.

As used herein, the term “and/or” placed between a first entity and asecond entity means one of (1) the first entity, (2) the second entity,and (3) the first entity and the second entity. Multiple entities listedwith “and/or” should be construed in the same manner, i.e., “one ormore” of the entities so conjoined. Other entities may optionally bepresent other than the entities specifically identified by the “and/or”clause, whether related or unrelated to those entities specificallyidentified. Thus, as a non-limiting example, a reference to “A and/orB”, when used in conjunction with open-ended language such as“comprising” can refer, in one embodiment, to A only (optionallyincluding entities other than B); in another embodiment, to B only(optionally including entities other than A); in yet another embodiment,to both A and B (optionally including other entities). These entitiesmay refer to elements, actions, structures, steps, operations, values,and the like.

Various aspects will be presented in terms of systems that may includeseveral components, modules, and the like. It is to be understood andappreciated that the various systems may include additional components,modules, etc. and/or may not include all the components, modules, etc.discussed in connection with the figures. A combination of theseapproaches may also be used. The various aspects disclosed herein can beperformed on electrical devices including devices that utilize touchscreen display technologies and/or mouse-and-keyboard type interfaces.Examples of such devices include computers (desktop and mobile), smartphones, personal digital assistants (PDAs), and other electronic devicesboth wired and wireless.

In addition, the various illustrative logical blocks, modules, andcircuits described in connection with the aspects disclosed herein maybe implemented or performed with a general purpose processor, a digitalsignal processor (DSP), an application specific integrated circuit(ASIC), a field programmable gate array (FPGA) or other programmablelogic device, discrete gate or transistor logic, discrete hardwarecomponents, or any combination thereof designed to perform the functionsdescribed herein. A general-purpose processor may be a microprocessor,but in the alternative, the processor may be any conventional processor,controller, microcontroller, or state machine. A processor may also beimplemented as a combination of computing devices, e.g., a combinationof a DSP and a microprocessor, a plurality of microprocessors, one ormore microprocessors in conjunction with a DSP core, or any other suchconfiguration.

Operational aspects disclosed herein may be embodied directly inhardware, in a software module executed by a processor, or in acombination of the two. A software module may reside in RAM memory,flash memory, ROM memory, EPROM memory, EEPROM memory, registers, harddisk, a removable disk, a CD-ROM, or any other form of storage mediumknown in the art. An exemplary storage medium is coupled to theprocessor such the processor can read information from, and writeinformation to, the storage medium. In the alternative, the storagemedium may be integral to the processor. The processor and the storagemedium may reside in an ASIC. The ASIC may reside in a user terminal. Inthe alternative, the processor and the storage medium may reside asdiscrete components in a user terminal.

Furthermore, the one or more versions may be implemented as a method,apparatus, or article of manufacture using standard programming and/orengineering techniques to produce software, firmware, hardware, or anycombination thereof to control a computer to implement the disclosedaspects. Non-transitory computer readable media can include but are notlimited to magnetic storage devices (e.g., hard disk, floppy disk,magnetic strips . . . ), optical disks (e.g., compact disk (CD), digitalversatile disk (DVD), BluRay™. . . ), smart cards, solid-state devices(SSDs), and flash memory devices (e.g., card, stick). Of course, thoseskilled in the art will recognize many modifications may be made to thisconfiguration without departing from the scope of the disclosed aspects.

The publications discussed herein are provided solely for theirdisclosure prior to the filing date of the present application. Nothingherein is to be construed as an admission that the present disclosure isnot entitled to antedate such publication by virtue of prior disclosure.Further, the dates of publication provided may be different from theactual publication dates which may need to be independently confirmed.

It should be noted that all features, elements, components, functions,and steps described with respect to any embodiment provided herein areintended to be freely combinable and substitutable with those from anyother embodiment. If a certain feature, element, component, function, orstep is described with respect to only one embodiment, then it should beunderstood that that feature, element, component, function, or step canbe used with every other embodiment described herein unless explicitlystated otherwise. This paragraph therefore serves as antecedent basisand written support for the introduction of claims, at any time, thatcombine features, elements, components, functions, and steps fromdifferent embodiments, or that substitute features, elements,components, functions, and steps from one embodiment with those ofanother, even if the following description does not explicitly state, ina particular instance, that such combinations or substitutions arepossible. It is explicitly acknowledged that express recitation of everypossible combination and substitution is overly burdensome, especiallygiven that the permissibility of each and every such combination andsubstitution will be readily recognized by those of ordinary skill inthe art. In many instances, entities are described herein as beingcoupled to other entities. It should be understood that the terms“coupled” and “connected” (or any of their forms) are usedinterchangeably herein and, in both cases, are generic to the directcoupling of two entities (without any non-negligible (e.g., parasitic)intervening entities) and the indirect coupling of two entities (withone or more non-negligible intervening entities). Where entities areshown as being directly coupled together or described as coupledtogether without description of any intervening entity, it should beunderstood that those entities can be indirectly coupled together aswell unless the context clearly dictates otherwise.

While the embodiments are susceptible to various modifications andalternative forms, specific examples thereof have been shown in thedrawings and are herein described in detail. It should be understood,however, that these embodiments are not to be limited to the particularform disclosed, but to the contrary, these embodiments are to cover allmodifications, equivalents, and alternatives falling within the spiritof the disclosure. Furthermore, any features, functions, steps, orelements of the embodiments may be recited in or added to the claims, aswell as negative limitations that define the inventive scope of theclaims by features, functions, steps, or elements that are not withinthat scope.

What is claimed is:
 1. An automated system for end-to-end eye screening,monitoring, and diagnosing of one or more diseases or disorderscomprising: a moving platform; a head rest coupled to the movingplatform; a computing platform coupled to the moving platform, whereinthe computing platform comprises a user interface sub-system, an imageacquisition sub-system, an image analysis sub-system, and a screeningand diagnosis report sub-system; an imaging device coupled to the movingplatform and configured for capturing one or more images of an eye of apatient; a controller coupled to the moving platform and to thecomputing platform, wherein the controller is configured for controllingthe movement of the imaging device in three-dimensional space; andwherein the image analysis sub-system is configured for analyzing thecaptured one or more images to provide one or more results of thescreening, monitoring, and diagnosing.
 2. The automated system of claim1 further comprises an electronic display device or a printer fordisplaying or printing the one or more results of the screening,monitoring, and diagnosing.
 3. The automated system of claim 1 furtherincludes one or more guidance and safety sensors and mechanisms.
 4. Theautomated system of claim 3, wherein the one or more guidance and safetysensors comprise at least one of proximity sensor and proprioceptivesensor.
 5. The automated system of claim 3, wherein a machine learningmodule receives and uses sensor inputs to output commands to the movingplatform to move the imaging device to a working distance from the eyeof the patient.
 6. The automated system of claim 1, wherein thecontroller controls the movement of the imaging device towards a workingdistance from the eye of the patient in multiple stages.
 7. Theautomated system of claim 6, wherein at each of the multiple stages theimaging device captures a progressively closer visible field of view ofthe eye of the patient.
 8. The automated system of claim 6, wherein theautomated system tracks a pupil of the eye of the patient while theimaging device is moving towards the eye of the patient.
 9. Theautomated system of claim 8, wherein the automated system employs acombination of multiple tracking techniques.
 10. The automated system ofclaim 1, wherein movement of the imaging device is automaticallycontrolled by the image acquisition sub-system via the controller. 11.The automated system of claim 1, wherein movement of the imaging deviceis assisted by an operator.
 12. The automated system of claim 11,wherein the operator uses a joystick or a touch-activated electronicdisplay, or voice commands to assist the movement of the imaging device.13. The automated system of claim 12, wherein the automated system isactivated using one of a button/switch, the touch-activated electronicdisplay, pressure-sensors in the head rest, and a voice-activatedcommand.
 14. The automated system of claim 1, wherein the controller iscoupled to a plurality of sets of motors, each set is configured forcontrolling motion of the imaging device along one of a spatial x-axis,a spatial y-axis, and a spatial z-axis.
 15. The automated system ofclaim 1, wherein the controller controls the movement of the imagingdevice in desired safety limits.
 16. The automated system of claim 1,wherein the controller controls the movement of the imaging device inaccordance to current dynamics.
 17. The automated system of claim 1,wherein the imaging device captures one or more images of a retina ofthe eye of the patient, where the one or more images capture differentregions of the retina.
 18. The automated system of claim 1, wherein theimage analysis sub-system analyzes the captured one or more images basedon one or more of visualization of anatomical structures and lesions,proper focusing of the image, and correct illumination of the image toallow for a clear and focused view of the retinal vessels.
 19. Theautomated system of claim 1, wherein the automated system employs one ormore deep-learning methods to detect a retinal field present in the oneor more captured images of the eye of the patient.
 20. The automatedsystem of claim 19, wherein the one or more deep-learning methods is aneural network.